Physiologically Motivated Audio-Visua
نویسنده
چکیده
An audio-visual localisation and tracking system for meeting scenarios is presented which draws its inspiration from neurobiological processing. Meetings are recorded by a KEMAR binaural manikin and a single camera placed directly above the manikin. Source localisation from the binaural audio and face, object and motion locations from the video frames are used as input to two linked neural oscillator networks. The strength of the connections between the two networks determines the mapping between activity at a particular audio azimuth and activity at a particular visual frame column. A Hebbian learning rule is used to establish the connection strengths. The combined network segments the video and audio features and then produces audio-visual groupings on the basis of common spatial location. The audio-visual groupings are tracked through time using a mechanism based upon that of the human oculomotor system which incorporates smooth pursuit and saccadic movement.
منابع مشابه
A Physiologically Inspired Method for Audio Classification
We explore the use of physiologically inspired auditory features with both physiologically motivated and statistical audio classification methods. We use features derived from a biophysically defensible model of the early auditory system for audio classification using a neural network classifier. We also use a Gaussian-mixture-model (GMM)-based classifier for the purpose of comparison and show ...
متن کاملPotential relevance of audio-visua computational
The purpose of this study was to examine typically developing infants’ integration of audio-visual sensory information as a fundamental process involved in early word learning. One hundred sixty pre-linguistic children were randomly assigned to watch one of four counterbalanced versions of audio-visual video sequences. The infants’ eye-movements were recorded and their looking behavior was anal...
متن کاملPhysiologically motivated audio-visual localisation and tracking
An audio-visual localisation and tracking system for meeting scenarios is presented which draws its inspiration from neurobiological processing. Meetings are recorded by a KEMAR binaural manikin and a single camera placed directly above the manikin. Source localisation from the binaural audio and face, object and motion locations from the video frames are used as input to two linked neural osci...
متن کاملReal-time Synthesis of Chinese Visua using MPEG-4 FAP Features in a
This paper describes our initial work in developing a real-time audio-visual Chinese speech synthesizer with a 3D expressive avatar. The avatar model is parameterized according to the MPEG-4 facial animation standard [1]. This standard offers a compact set of facial animation parameters (FAPs) and feature points (FPs) to enable realization of 20 Chinese visemes and 7 facial expressions (i.e. 27...
متن کاملPhysiologically-motivated synchrony-based processing for robust automatic speech recognition
This paper describes the structure and performance of a new signal processing scheme, motivated by the physiology of the peripheral auditory system, that improves speech recognition accuracy in the presence of broadband noise. An important attribute of the peripheral processing is a novel mechanism to represent the cycle-by-cycle synchrony in the response of low-frequency auditory-nerve fibers,...
متن کامل